perm filename STREAM.MSS[11,HE]1 blob
sn#656330 filedate 1982-04-29 generic text, type T, neo UTF8
@Part(streams, root "manual.mss")
@Section(The Edit-String Protocol)
@Comment(
Originally written by V. R. Pratt December, 1980
Converted to Scribe by Bill Nowicki May 1981
)
@Index(Streams)
@Index(Edit-Streams)
Edit-strings are a particular representation of strings designed to support a
variety of protocols for operating on strings. The most general string editing
operation, replacement, is supported, but certain special-purpose kinds of
operations such as stream-read-and-write have particularly efficient
implementations with edit-strings. All references to edit-strings are via
ordinary pointers to individual data within a string.
@SubSection(The Edit-String Data Structure)
Storage is divided into buffers each of size a power of two, each aligned by
its size. Each buffer contains a header and a body. The header is structured
into the following fields.
@Begin(Verbatim)
typedef struct Bufhead /* Buffer head definition */
{struct Buffer *next; /* next buffer */
struct Buffer *prev; /* previous buffer, if any */
char size; /* log base 2 of size of buffer */
char type; /* type of data in buffer */
short llim; /* left end of data in buffer */
short rlim; /* right end of data in buffer */
short refco; /* number of owners of this buffer */
} *bufhead;
@End(Verbatim)
The exact start and end of data in the body are defined by both delimiters and
the @t[llim/rlim] pair. The last delimiter before the start address marks
the start boundary, while the first delimiter following the end address marks
the end boundary.
@SubSection(Reference)
All references to edit-strings are via pointers to data within the body of some
buffer. This differs from schemes where a reference is a more complex structure
which may include a count of the remaining items in the current buffer and the
address of a function to call when the count vanishes.
Data are accessed via pointers in the usual way. In the case of accessing a
stream the pointer will be incremented after the access. This may take it out
of the data. The test for this is performed when accessing rather than when
incrementing. The condition of being out of the data is recognized when the
pointer points to a delimiter and the end address given in the buffer header
does not lie beyond the pointer. When storing at the end of an edit-string, as
when writing a stream, the writer only need update rlim if it writes a
delimiter.
@SubSection(Locating Block Headers)
The question arises as to how to locate the start of the buffer given only a
pointer into the buffer body. To this end all buffers are taken to be of size
a power of two, and are aligned on buffer boundaries, that is, they start
at an address which is a multiple of their length.
When the size of the buffer is known, the buffer header may be located by
masking out the low-order bits of the pointer. In effect a pointer has two
components, a pointer to the start of the buffer and an offset from that
start. Unlike other ways of forming two-element structures, this approach has
the benefit that the structure forms an ordinary machine address that can be
used as an indirect reference and incremented in the usual ways.
When the size of the buffer is not known, it may be determined by trying all
sizes in decreasing order and for each, masking out the low-order bits of the
pointer as in the case when the size is known. The largest size yielding a
header which (a) is within the storage region allocated to this scheme and (b)
contains this size in its size field, is the size of the containing buffer.
The correctness of this method depends on the observations that (a) every
address within a buffer yields the buffer header using SOME mask, and (b) every
mask larger than must yield some buffer header.
The advantage of this approach is that processes that know the buffer size in
advance can run faster, while processes that don't, such as postmortem
diagnosers, can still find their way around given only pointers into buffers.
@SubSection(Asynchronous Access)
The question now arises as to asynchronous reading and writing; can a reader
get a consistent picture of a string while it is being written? In general a
certain amount of explicit synchronization of independent processes may be
needed for some kinds of reading and writing. However for some of the cases
we are particularly concerned about, explicit synchronization can be avoided.
Consider the case of any number of readers reading in either direction in a
stream being written only at the ends (i.e. no inserting other than at the
ends, and no deleting). By symmetry it suffices to consider a reader
scanning forwards. A reader fetching a non-delimiter is assured that he is
not at the end of the stream. Fetching a delimiter is more problematical;
the delimiter may have been genuine at the time of the fetch, but may be
replaced by some other datum by the writer before the reader fetches the end
address.
If the writer has not updated the end address then no serious problem arises;
the reader merely has an out-of-date picture. However suppose that between the
reader's reading the delimiter and checking the end address that the writer
writes a non-delimiter And then the delimiter. The writer will then bring
the end address up to date and the reader will be fooled into believing that
the delimiter it read is a datum.
There is an easy cure for this problem. The reader identifying a delimiter
as a datum should always refetch that datum and discard the old datum. The
second fetch is guaranteed to be correct since all writing is performed at
the end. This method avOids the expense of explicit synchronization.
When the need does arise to synchronize explicitly, the question arises
as to how to minimize overhead. System calls to raise interrupt
priority or defeat the interrupt mechanisms are unduly expensive; it is
considerably cheaper to rely on shared semaphores when these can be set,
tested, and reset with single instructions.